DL-$elect: A Decision-List-Based Data-Mining System

نویسنده

  • Karl Weinmeister
چکیده

The application of machine-learning algorithms to the financial markets has been increasing in popularity in recent years. The majority of systems that have been created for the purpose of selecting stocks have utilized neural-network techniques. Our research has dealt with the feasibility of inductive logic approaches and the creation of a decision-list-based data-mining system, DL-$elect. Neural networks can model a variety of data distributions and handle inconsistent data well. But for complex problems such as financial analysis, the structure of a neural network can be difficult to interpret. Decision lists (Rivest 1987), however, are represented in an easily understood form: an extended “if-then-elseif-...else-“ rule. Iterative algorithms for decision lists append rules into a list and remove examples from the data set that are covered by these rules. Effective future learning depends on early rule selection, which, if made poorly, can reduce the accuracy of the entire decision list. The algorithm described by Rivest (Rivest 1987) avoids this obstacle by assuming 100% accurate rules in the training data, but consequently leaves open the problem of noisy data. The learning algorithm used in DL-$elect, BruteDL (Segal and Etzioni 1994), addresses this and other issues by conducting a single search for homogenous rules—rules in which accuracy is independent of list position. Since homogenous rules need not be 100% accurate, BruteDL is better suited to handle the noise of financial-market data. There has been significant discourse in the financial and academic community regarding the efficiency of financial markets. The efficient-markets hypothesis asserts that stock prices already reflect any available information, rendering forecasting attempts useless. DL-$elect is based on the notion that markets do in fact exhibit short-term inefficiencies—trends from the previous week of activity carry over to the next week. The portfolio gleaned from DL-$elect is not intended for a buy-and-hold strategy; rather, it is meant for weekly changes. By using fresh data each week, Dl-$elect avoids the issue of non-stationarity, in which statistical properties of the market change with time. Two key data elements are needed by DL-$elect: a list of stock attributes such as price/earnings ratio, and a list of price changes acquired one week later, corresponding to the first list. DL-$elect assembles 11 attributes for 600 stocks into the attribute list, inserts the price-change data, and cleans any malformed data. Next, stocks from the resulting data file are labeled as excellent if they perform in the top α% (in our simulation α=20) since BruteDL is a classification algorithm that requires a category to predict (John and Miller 1996). The data file is then randomly partitioned into a 60% training, 30% testing, and 10% pruning blend and entered into BruteDL. The generated rules determine which variables make an “excellent” stock.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism

Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...

متن کامل

Development of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism

Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...

متن کامل

Applying a decision support system for accident analysis by using data mining approach: A case study on one of the Iranian manufactures

Uncertain and stochastic states have been always taken into consideration in the fields of risk management and accident, like other fields of industrial engineering, and have made decision making difficult and complicated for managers in corrective action selection and control measure approach. In this research, huge data sets of the accidents of a manufacturing and industrial unit have been st...

متن کامل

Data Mining andCBR: Complimentary methodologies for building information enhanced decision support system

This paper deals with a help-desk application development for prevention of occupational injuries and thereby minimize sufferings and costs for the society. We have employed data mining and CBR (Case Based Reasoning) techniques as complimentary methodologies for building the application. In the work we have used a database on work related injuries in Sweden. For the data mining part we have use...

متن کامل

ART: A Hybrid Classification Model

This paper presents a new family of decision list induction algorithms based on ideas from the association rule mining context. ART, which stands for ‘Association Rule Tree’, builds decision lists that can be viewed as degenerate, polythetic decision trees. Our method is a generalized “Separate and Conquer” algorithm suitable for Data Mining applications because it makes use of efficient and sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998